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Sa^npling Variances and Cbvariances 
;^ ' of Paramet§^ Estimates in Item Response Theory 

\ Abstract 

This paper develops a possible method for\ompiting the asymptotic 
* • ' \ / 

sampling vairiance-covariance matrix of' joint maximum likelihood estimates 

"^in item response theorx^hieiTboth item parametei/s and abilities are 

unknown. For a" set of artificial data, resuj^s are compared with empirical 

values; also with the var iance-covariance matrices found by the usual 

"I 

formulas for the case where the abilities are known, or where the item 
parameters are known. The results are consistent with the conjecture 
that the new method is • asymptotically correct excerpt for errors due| tp 
grouping. 



Sampling Variances and Covariances 
of Parameter Estimates in Item Response Theory* , 

In item response theory (IRT), the observations, come in the form 

\ 

of an n -l)y-- N matrix, with one row for each item and' one column for 
each examinee. The joint- freq^uency distribution of the^ observations 
depends on a vector of N 'ability' par§meters — one for* each person — 
and on a matrix of item parameters, Her-e, we will consider only the 
three-parameter logistic model for dichotomdusly scored items, so there 
will be thtee item parameters ( a , b , and c ) for each of n 

/ 

items. A method will be developed for C9inputing the asymptotic sampling 
variance-covariance matrix when .both abilities and item parameters are 
unknown. Until this is done we do ndt know the standard errors of the 
parameter "^^stimates, which handicaps development of ^ goodness-o^ fit test 
and other . statistics required in applications of IRT. .1 

If the item (ability) parameters are knox^m, the estimated ability 
(item) parameters have independent sampling distributions. It ckn be 
shown (see Bradley & Gart, 1962) that the maximum likelihood estjimates 
of the ability (item) ' parameters are consistent. Hence the asymptotic 
sampling variance for an estimated ability parameter is given by the 
usual formula 

J ■ 

VavCT^\^,h,c) = [S(d(/^T^)^]~\ , . ' . /(la) 

where the estimated ability parameter, / is the log of the 

likelihood, and a , b , and c are the known vectors of item parameters. 

■ : \ 

*This work was supported in part by contract N00014-80~C-"0402, 
project designation NR 150-453 between the Office of Naval Research 
and Educational Testing Service. Rep^^t^^iiption in whole or in part is 
permitted for any purpose of the_UnitM-™StA£es- Governmen^^ 
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Simi^arly the asymptotic sampling-var^iance-covariance matrix of the 
estimated item parameters for an item is given by 

||Cov(^ .^^|e)|i = p( l^lf- )|rl ^(v,w= 1.2,3) (lb) 

• V w 

where {t } is a vector consisting of the estimat^ a , b , and c 

V 

for a single item and 9 is t^ie known vect9r of abilities. . 
•The right-hand side is the inverse of a 3-by~3 matrix. 

When neither item nor ability parameters are kno^Am, all param- 
eters are of ten estimated simultaneously by maximum likelihood. In 
the (Rasch) case where there is only one parameter per item, Haberman 
(1977) has shown that all parameter estimates will converge to their 
true values (will be consistent) when the number of examinees and the 
number of test items become large simultaneously. Empirical results 
suggest that consistency probably also holds when all parameters are 
estimated simultaneously under the three-parameter model,. If so, 
it is reasonable that the asymptotic sampling variance-covariance matrix 
of all estimated parameters be given by the usual formula 

||Cov(f f )|| = Mrff ^ .( P.q = 1,2. .. ..M ) ■ (2)/ 

/ D a / 



p q 

/ , 

/ 



where M =-3n rf N - 2/ and t = {t^} = ^ » . c^. a^, b^, c^. . . . . a^.b^, c^; 



/ 



/ 
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Sihce standard errors are' urgently needed In practical work \ 
where all parameters are estimated simultanej&usly by maximum likelihood,- 
this report compares numerical values provided by (2) with values provid 
by (1) and with empirically observed .sampling fluctuations. The com-^ 
parisons to be presented suggest that (2) provides useful values for 
the desired standard errors. 

There are several special problems that arise in the evaluation 
and practical utilization of, (2), problems that /do not arise in the 
situation where (1) is appropriate; 

1. Until an origin and scale are specified, the parameters 
are not identifiable. 

2. The mathematical formulation is complicated by the choice 
of origin and scale. 

3. The usual choice of origin and' scale when estimating IRT 
parameters is inconvenient for mathematical purposes. 

4. The numerical values of the sampling variance^ are very 
much affected by the choice of origin and scal^^. ^ 

■ - :'' k 

5. Equation (2) requires the inversion of a matriiK of or^er 

■ 

N 4- 3n — 2 where N may be several thousand. 
These problems will be considered in subsequent sections . 

1. Parameter izat/iou 

The appropriate likelihood functio^n is (Lord, 19§0) 
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where 0^. is the vector of the N ability parameters; a , b , and 

c are each a vectori^f* n item parameters, U f ||u. || is the matrix 

of item rd'sponses u. (= 0 or 1) ; finally Q; = 1 - P. and 

la la la 

P^^ is the item response function,- the probability of a correct 

•answer by examinee 'a to item i . Each given P. is a function 

• la . 

of 9 ind of a. ■ b. , and c. , but not of any other parameters. 

a 11 1 ' j 

In numerical work here, P. will be taken to/be the three-parameter 

.ia ' _ / • ^ , 

logistic function 



1 - c_. 

^ia - ^i 1 -f .exp[-1.7a.( 0 - b )] 

1 a 1 



(4) 



For mathematical purposes, however, it is only necessary to state that 



P. is an increasing function of 0 
xa ^ a 

\ 

If we add some constant to all 0 and subtract the same constant 

a 



from all b. , all P. w,ill be unchanged. This means that the origin 
1 la I ^ 

used foiCr measuring ability is entirely arbitrary. If we multiply each 

6 and each b. by some constant and divide each a, by the same 
a 1 ^ i 

constant, again all P. will be unchanged. This means that the unit 

xa 

used to measure ability is entirely arbitrary. Since we can change 

the origin and unit of the 0 without changing (3) , it follows that 

a 

0 , a , ' b , and c 'are not ide ntifiable and cannot be estimated from 
(3) without further specification. 

To conform to a commonly used procedure, we could choose the 
origin and scale so that for some specified group of examinees the 

mean of the 6 is zero and the. variance Is one. This is not con- 

..a ^ . 

venient math-amatically , however. Instead, two other methods of 
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specifying the origin and scale will be used, even though this will 
complicate matters later on when- the results are applied in 
pr,actice. In the first method, without loss of generalitjy, arbitrary 
numerical values will be assigned to ^-^^i * 
The M = N + 3n 4 2 likelihood equations are 



n N 

= I E (i 
i=l a=l 



,ia 



xa 



P.J 



ia^ P. Q. 
la la 



( p = l,2j,,.,M ) 



(5) 



where P^^^ = ^Mj^'^r. • 
p ia p 

2> Fisher Information Matrix 

The Fisher information matrix on the right of (2) now has as a 
typical element 



'~ / / . n n .N N P^V^ . 

'pq fr'> = ' ' ' ' P Q g°^("ia""jb^ 

^ p q i=l j=l a=l b=l ia^ia jb^jb 



-■( p,q = 1, 2, . . . ,M ) 



Because of local independence and random sampling of examinees, 



Cov(u ,u., ) = 6. .6 , P. Q, 
ia' jb iq ab ia^ia 



where 6 =l if s = t,l6 =0 otherwise. Thus tHie typical 
St \ St 

element is 



pq 



n N P 

' ' P Q , 
i=l a=l la^ia 



iapi; 

p q 



( p,q = 1,2, ... ,M ) 



(6) 
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la 

Note that P"^ is zero unless either p and a refer to the 

, P , 

same person, or p and i refer to the same item. Thus 
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(7) 



where N' 5 N - 2 , is the 3-by|-3 Fisher information matrix -for 

a. , b. , and c, *, • t is the Fisher infcrmatiion for examinee a , 

11 1 a I '\. ' 

and f . is the 3-by-l joint Fisher information Vector for item i 

■s, la ! 

1 N 

and examinee a : \ I 



. la 



9P, /9a, 
ia i 

9P. /9b. 
la 1 

9P, /9c, 
ia i 



I 



3» Matrik Inversion 



The following .general fonnuia. for inverting a partitioned matrix 
may be^pplied to (7) 



S i F 
— f.- 
F' I T 



-1 



,-1 



wl|ere ' 

\ 



Z = T - F'S~-'-F 



(8) 



The^matrix S is easily inverted since it is a diagonal supermatrix: 



= II s:^ II \ . 



\ 



The notation on the right de'fiotes a diagonal matrix with diagonal ele- 
ments S^"*" . These last are easily computed since each is only 
3 by 3. 



All the matrix operations indicated on the right side of (8) can 

be carried out on the computer without difficulty, with one exception: 

the inversion of Z , which is N'* by N' . The approximation used here 

to invert Z rnlies on grouping the 6 into 16 class intervals pf 

a 

.width 0.5, covering the range -5 < 0 £ 3 . Each 0 in a given 

a a 

/ 

class interval is replaced by the midpoin.t- 'of the interval. 
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Now T will be a diagonal .supermatrix '-.T 3 



, where T - t I 
g g 

is--ar- scalar maTCrix with dimensions N by N , and N is the number 

g ^ . ' s 

^of people in class interval g . Also, F will be a row vector of 16 
• matrices, the columns of any one matrix being all identical: 



where f *= {f , } for any examinee a in class interval g and ^ 
.g la 

1 is a unit vector whose length is^ N 

The product^ F'S '''F can now'be written as a 16-by-16 supermatrix: 

F'S~-^I\ = 111 f 'S~-'-f,l' II 
~g~g ~h.h" 

Denote the scalar f'S~"^f^ by W ^ We now have 
. ~g ~h gh 



Z T - M J 
" ♦ gh' 



M u = W ^1 1 ' 

\ gh gh.g.h 



(110 
(12) 



For comp^iit^ion purposes,. Z still has N' rows and columns, S, 
not just 16.^ For ttte^sual s£.mple size, it is still not feasible to 
invert Z with a .sbandard inyersion program. 



Consider the problem of inverting 



, the -by- upper 



left corner of Z . By (11), (12), and a standard 



13 
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^11 



-1 -1 "ll'^l "'"I"'"!'''! 



(13) 



Since ~ ' "here is scalar, this becomes 



w 1 1' 
1 - t^w^^N^ 



Next, the upper jleft 2-by-2. supermatrix in Z can be inverte. 
as in (8) , using the standai^d formula for the inversion of a : 
partitioned matrix: 



Z Z 
11 12 

L^21 ^22J 



T-1 



^11 ^^iri2^ ^^21^11 I ^11^12^^ ^ 



-^^■'^21^U 



-1 



(14) 



/ 



where H = Z^^ - ^^2lhlh2 



same general form as Z 



li- 



lt can be seen that H has the 
and can thus be inverted as in (13) ; 



so (14) can readily be Cixlculated . 

Next, substitute (14) for Z^^ in the foregoing procedure, 
and repeat this procedure, in suah a way as to invert the upper 



left 3-by-3 supermatrix in Z . A total of fifteen repetitions enable 
us to invert ^he 16-by-l6 supermatrix Z . Equation (8) is now used 
for one final inversion, the result being the desired variance-covariance 
matrix of all N + 3n - 2 parameters,. 

The 16-by-l6 variance-covariance supermatrix for the 0 consists 
of 256 blocks. The elements are all the same within a block except 
for diagonal blocks, each of which has a variance (instead of a 
covariance) repeated along its diagonal. Any two examinees in the 
same class interval will have identical Var 0 and identical sampling 
covariances with any other given parameter estimate. 

4 . Reparameterization 

In Section 1, in order to have identifiable parameters, an origin 

and scale was chosen so that 0^^ , and 0., had arbitrary preassigned 

N-1 N 

values. Any other choice of origin and scale would result in a linear 
transformation of parameters. The likelihood function would remain 

^ii^iichanged for every pattern of item responses. 

The choice of unit Cbut not-the, . choice of^ origin) has one 
completely obvious effect^ on the sampling errors of parameter estimates. 
If the unit) is changed, the standard .errors for the b 's and 0 's 
will be multiplied by the ratio of the new scale unit to the old scale 
unit. The standard errors for the a 's will be" divided by this ratio.' 
A second important effect is easily overlooked: the standard error 
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of the maximum likelihood estimator depends not only on the choice of 
scale, but ' also on how the (origin and) scale is "specif led . 

Suppose. that the true numerical values of all 0 ( a = 1,...,N ) 

/ ^ 

are specified on some arbitrary scale. Suppose next that our test is 
too difficult for examinee N . This means that th^^l ikelihood func- 
tion is rather insensitive to variations in 6,, . If we could repeat' 

N " 

our testing with several parallel test forms, we would find a wide ran^e 

of estimates of 6^' . In such a situation, the difference between 

^ N 

true 6^^ - and 6^^ clearly cannot be estimated well from the 
'N-1 N ^ ( . 

examinee responses. If we define the scale by treating 0^ and 
^N-1 known, our estimates of every 6^ may fluctuate grossly, 

simply because the scale unit 6„ - 0„ ^ is not well determined by 

N N-1 ^ 

the data. . 

Suppose next that we relabel all examinees so that examinees 



N - 1 and N \ are not the same examinees as bef/ore. The ability scale 

^ ' '^^ / 

has not been changed from the preceding paragraph; it is the procedure 

■ " ■ 

for defining the scale that has been changed . N^The true 0 for each 
examinee is still the same as before. Suppose' the new examinees N ~ 1 



^nd N are both at ability levels where our jtest measures accurately 

If, further, the true 0^^ , and 9„ are substantially different 

N-1 N 

from each other, the difficulty of the previous paragraph disappears: 

Throughout the ability range where the test is designed to measure ■ 

accurately, the standard errors of all 0 may be reasonably small. 

a 
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- ' ■ 

'For example/ suppose on some scale 6^ - -3 , ®2 ~ ' ^3 ~ ""^ ' 

0=0, 0^ = 1 , 0, = 2 , 0- = 3 ^ We can specify this same scale . 
4 5 6 / 

in terms of any two of these 0 's. The standard errors that we obtain 
will depend in an overwhelming way not just on the ability scale, but on 
how we specify it. We cannot rectify the standard errors by some 
simple procedure, such as multiplying each by a constant. 

For this reason, our procedure for specifying the ability scale 
should depend only on parameters or functior.s of parameters that are 
accurately determined by the data. A robust mean of the 0^ might 

seem attractive; however, any function of the 0^ is counterindicated 

, a 

by the fact that sometimes 0 = + co , 

T^ie procedure used here is to choose a set of m discriminating, 
moderately easy items and a set of r discriminatin3, moderately 
hard items. We will hereafter define the origin and unit for our 
new parameters, to be denoted by capital letters, so that the tn^an 
of the (true) B -parameters for the easy items is zero, and the mean 
for the hard items is one. 

Our new parameters are related to our old parameters (from 
Section 2 or from Section 5) linear transformations: 

• * / ■ 

■ = ka^ . B. E K + b^/k , C. = , 0^ = K + e^/k , (15) 

( a = 1,2, . . . ,N ; i = 1, 2,>. . . ,n ) 

where k and K are transformation constants to be determined. 
Since 
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1 ™ - 1 

' • ' ■ ■ / ■ .. ' ' \ 

the values of- k and K are found by substituting (15) into (16) and 
solving for k and K : 



k = - bQ . K ^ , ^ (17) 



where b^ and b^ are means for m and r items, respectively. 

To find the variance-covariance matrix for estimates of the upper- 
case parameters^ re\vrite (15) as 



^a^= - ^i = ^^i ' \ = - > 



^i = ^i 



(18) 



Because of the special properties of maximum likelihood estimators,, 
equations (18) still hold when estimators are substituted for ^parameters . 
Thus the sampling variances and covariances for estimates of the new 
parameters can be computed from the sampling variances and covariances 
already obtained at the end of Section 3» Formulas for doing this can 
be written down from (18) by using the 'delta' method (Kendall & 
Stuart, 1969, Chapter 10). For example. 
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e - b- 

a , 0 



Cov(A^,0^) = Cov(a^,e^) - CovCa^.b^) ^ Cov(a^,k) 




/ 
./ 



/ . 
I 

I \ 



Cov(bQ,k) = CovCb^.b^) - Var Bq , 
CovCb^.b^) = ^ ^ Cov(b.,b^) ^ . 

5. Parameter Estimation 



The maximum llikelihood estimators (MLE) satisfy the likelihood 

equations (5). In (5), there is one equation for each parameter 

omitting e , and 6^^ . If all N + 3n = M + 2 MLE are linearly 
° N-1 N 

transformed, as for exampl^ in (15), the transformed parameters will 
still satisfy the likelihodd equations. 

Since the origin and siale for the new parameters is chosen to 
^satisfy (16), then the appropriate k and K jare obtained from (17) 
after replacing b^ and b^ by their 'mlE. The likelihood function 
(3) is unaffected by these linear transformations. 

The computer program LOGIST identifies the parameters by still 
another choice of origin and scale: 
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1. a certain truncated mean of the 9 ( a = 1,2,.. .,N ) is set 
equal to zero, ' i 

2. a certain truncated standard deviation of ' the 9 is set 

/ • i a 

equal to one. 

We will use the usual lower case s3mibols for parameters on this 

LOGIST scale. This should not cause confusion, since the lower-case 

parameters of Sections 1-3 will not be needed again. 

^If we start with LOGIST a. , b. , c. , and 6 and determine . 

1 11 a 

k and so that = 0. and B = 1 , then the A, , B. , C. 

0 . 1 i ' 1 i 

( i = 1, 2, . . . ,n ) , and the 0 ( a = 1, 2 , . . • ,N ) , calculated by 

a ^ 

substituting estimated values into (15), v/ill still satisfy the like- 
lihood equations . .The upper-case parameter estimates so obtained 
should have the sampling variance-covariance matrix found theoretically 
at the end of SeVtion 4. • Our remaining task is to compare an 
empirically determined variance-covariance matrix of MLE's with the 
corresponding theoretical matrix. . 

6 . ^ ..Recap.itulat ion 

■■■■■ /. ' . 

We have .used, at diff*erent points^ three different arbitrary 
scales for our parameters: 

1. 6^ and ^^^i assigned arbitrarily. x 

2. The origin is set at B^ , the unit is B . 

U V 1 ' 
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/ 

3. The o/irigin is set at a truncated mean of the 6 

■• / 

the/unit is a .truncated standard deviation of the 



"Scal,^ 1 (denoted by lower-case symbols) is mo3t convenient 
mathematically for the d.i.fficult task of inverting the M -by- M 
inforn)ktion matrix. Scale 1 is not useful for practical .purposes,^ 
howe/er, since its use grossly inflates all the sampling variances. 

/ Scale 2 (denoted by upper-case" symbols) seems the' simplest choice 
i/n an attempt to keep the sampling error in the estimated ' origin and 
^unit as small as possible. Tlie sampling variances ' computed for scale 
1 are transformed (see eq. 19) to values appropriate for scale 2. 
Although scale 2 is not the familiar one, the two item \ sets used to 
specify the scale can be chosen so that the numerical values of , 

3. , C. differ little from the- familiar a. , b. and c. 

\ 

produced by LOGIST. ^ 



Scale 3 (hereaft^er denoted by lower-case symbols) is\the, sckle 

\ ' 
\ 

used by LOGIST. , ' ' 



\ 



7« Empirical Estimation Procedures \ '\ \ 

. . ■ \ \ 
■ ' \ - \ 

As already stated, bur theoretical results can be trusteti only \\ 
if they are shown to be in treasonable agreement with empirical results\ 



the adtninistration of a 45-itetn test to a random sample of 1500 



For this purpose; artificial data - |u. || were created representing 

'> • la ^ \ , 



examinees. ''The 1500 9 were a spaced sample drawn rtom a distribution 
• a V 

V 

of abilities from a ^regular test administration. Six replicate, Wtrices 

of ||a. II were independ^ently generated, using the same ilem parameters and 

xa * ' w I 



the same 1500- 9^ . The variation in responses across thes^^ matrices thus 

represents random fluctuations in u. for fix'ed a. , b, 

^ la 1 : 

0 . 




c . and 
1 



Further replication was also built in: items 16-30 and items 31-45 
had the same item parameters^ as items 1-15. The true lower-case a:rid 



\ 



upper-case item parameters are shox^ in Table 1 for items i-'i5. 



\ 



Six independent runs were made on LOGIST, one for each group of 



\l5p0 examinees .\\ For each run separately, b^ was calculated from ite 
4V9^\^ 19-24, 34- 




39; was calculated from -items 10-15 ^--25-30 , 40-45. 

It \s convenient \for our ultimate interpretation of the standard errors 



to b'ev\obtained th^t the true b^ ~ = .671 - .(-.305)/= .976. Since 



'1 0 

this \is\ close to l|o, the scale unit for the capi€ali/zed parameters 



is veiry close to the scale unit for the lower-case (/LOGIST) parameters. 



F\Dr e^ch run s^^parateivy, all lower-case parameter estimat were 

\ \\ - \V ^ - . / 

^^inearl^y tra^sf ormed\ as in (15) to the upper-case scale, using esti- 
mated k andXxVK valiiks. For the data reported in subsequent sections, 

\ \ \\ . \ I 

th^-trud^ k = .W6 a^d>, the true K = .312 .. Since the six runs are 
independent, an utibiaaed empirical estimate of - the sampling variance of 
any pa'i^ameter estii^i'ate^ t is given by;>"is; y^'- 
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Table' 1 

, True (Upper Case) Item Parameters 


Item 


i ' 








C 


No. 


A 




B 


b 


or c 


1 


.96 


.99 


-1.75 


-2.01 


.17 


2 


.34 


.35 .. 


-1.33 


-1.61 


.17 


3 


1.34 


1.38 


-.80 


-1.09 


.17 


4 


.76 


.78 


-.48 ■ 


-.77 


i 


5 


.41 


.42 


-.38 


-.67 


' .17 


6 


.90 


.92 


-.04 


-.34 


.17 


7 


.90 


.92 


.16 


-.15 


.17 


8 


1.04 


1 . 06 


.31 


.00 


.17 


9 . 


1.31 


. , . 1.3,4 


.42 


.11 


.13 


10 


1.46 ' 


1.50 


.58 


.26. 


.34 


11 


.85 


, .87 . 


.79 


.46 


.17 


12 


.60 


. 62-' 


.90 


.57 


.17 


13 


1.06 


1.09 


1.01 


.68 


.25 


14 


^ 1.36 


- 1.39 


1.23 


.90 


.29 


15 


1.46 


1.50 


1.50 


1.16 


.18 



PRir 
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sl = ^ [ ^It^ - ( (20) 



the sum. being across the six LOGIST runs. If the T in (20) were 

2 2 

normally distributed, s^/c^j would have an F distribution with 5 

■ fe' 
and « degr'ees of freedom. 

Since three different items have identical item parameters, the 

s| for a single item parameter can be averaged across these three items 

to yield the best available unbiased* estimate: 



3 

-2 _ 1 2 

= 3 ^ V, • (21) 



Note that it would b/e incorrect to pool all 18 values of T in 

/ ^ 

an equation like (fO), since T from the same LOGIST run are not 
independent . 

1^ ?^L4iL§-^-"- represent two different item parameters in the 

same item ^ 

^64>s\) r Y ^ s(t^,s^) ^ ^ , (22) 

which is the same as (21) except that covariances are substituted for 
variances. If T. and S. represent item parameters in different items, 
,then there are nine different sample covariances to be summed: 
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3 3 ■• 

ia.,S.) =i EE sCf.,S.) . (23) 

If T is an ability parameter, (20) still holds. For our purposes, 

replacing T by 0 , we can write.' 

f > - 
N • . - ^ ^ 
g2 ^ J,,/ g2 (24) 
0 N '0 



where the sum is over all examinees in group g . Vlhen 0 is at* the 
midpoint of interval g , this average should be roughly equal to the 
obtained in Section 4. \ 

If subscripts a and b denote different examinees in group 

/ 



g g a>b 

where the sum is over all pairs of examinees in group g . If a and 
b denote examinees, in groups g and h respectively ( g 7^ h ) , then 

s(® ^ ^Ar ^ ^ s(e ,e,) . (26) 

. a b N N, T u 1 a b 
g h a=l b=l 

Finally, if i^^^n item patametpr and examinee a is in group g , 

then 
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E E s(T.,0 ) 
1 a 



(27) 



In comj^utlng (2^0 - (27), examinees are grouped on their true values, V 
not on their estimated values. 

A problem arises when an examinee obtains a perfect .score or a 
zero score. In this case his 0 is infinite and cannot be advantageously 
used. Instead of making some ad hoc adjustment, the* 17 examinees for 
whom this occurred were simply removed from the group of examinees 
studied, leaving N ~ 14'83 . This has the effect of slightly biasing 
sg for the remaining most extreme 0 values. 

8. Numerical Standard Errors 



Since the c ^parameter of an easy item usually cannot be 
accurately estimated, LOGIST in ordinary use does not estimate them 
individually. This would prevent the empirical standard errors of 
Section 7 from agreeing with the theoretical standard errors of Section 
4. Since our main purpose is to show that the method, of Section 4 can 
give useful results, the empirical and theoretical standard errors 
reported here are all estimated or calculated under the condition th^t 
t>he true values of c^ are known for i = 1>2, 3., 4, 5 , 12 . Items 1-5 
are easy items, item 12 was included because of its low a_^ . For 
empirical work, the true c values were supplied to LOGIST, which held 
them fixed while estimating all other parameters. For theoretical work, 
the rows and columns of (7) corresponding to. c^ , ^ ^3 > ^4 » ^5 > 
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and .c^2 were simply deleted from the information matrix (7) before 
inversion. • ^ " 

Table 2 compares the empirical standard errors of Section 7 for 
J* with the theoretical standard errors of Section 4. The last three 
columns show the squared ratios for the three replications of each 

item; each of these ratios wilj. have an F distribution with 5 and 

' / 

~ degrees/of freedom provided i) B has a normal sampling distribution 

ii) B is unbiased, and iii) the theoretical from Sec'tion 4 is 

B 

correct. An F above 2,21 or below ^.229 is significant at the (two- 
tailed), 10 percent level. Eleven of the* ratios are significant. The 
riunber of ratios less than 1 is approximately the same as the number 
of ratios greater than 1. 

In the past, the only available standard errors for item param- 
eters assumed that the 9 were: known. Such standard errors for B , 
for known 6 , are given in the second column of the table. A com- 
parison- of second and third columns shows very close agreement except 
for the three easiest items (1,2,3). For these three items, our new 
theoretical value is larger and agrees better with the empirical 
value. This gives support to the new theoretical values. The fact 

that the empirical values (from Section 7) tend to be larger than 
the theoretical^ (from Section 4) could be due to n and N not 
being large enough for asymptotic results. A second likely explana- 
tion is that LOGIST was not really run to complete convergence. 

Table 3 makes comparisons for A . Again the standard errors 
of A with 6 unknown agree closely with the results when is 
known. The empirical standard errors, although correlating well with 
the -theoretical, se^m to be larger. Eleven of the F ratios are 
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Table 2 

Theoretical and Empirical Standard Errors for B 





B 1 8 


B 


B 








Item 
No . 


V 2 Icnown ; 


(.beet, 'i ) 


(,bect. /) 




D p 




1* 


.110 


. 156 


.183 


.23 


.56 


3.34+ 


2^f 


.186 


.201 


.237 " 


1.76 


1.49 


.93 


3* 


.045 


.071 


.063 


1.38 


.59 


.41 


4* 


.060 


.068 


.066 


.90 


.76 


1.17 


5* 


.100 


.099 


.103 


.37 


.40 


2.48+ 


6 


.125 


.121 


.131 


.28 


.63 


2.63+ 


7 - 


.113 


.110 


,100 


1.24 


.65 


.58 


L 


.084 


.083 


/. 088 


2 . 31+ 


■ 97 , 


.16+ 


9. 


.055 


.055 


/ .067 


.37 


2.63+ 


1.47 


10 


.069 


.069 


..106 


3.19+ 


3.62+ 


.33 


11 


.100 


.097 


.122 


1.45 


2.55+ 


.70 


12* 


.094 


.091 


.087 


.85 


1.27 


.66 


13" 


.086 


,.083 


.094 


1.01 


1.20 


1.57 


v. 


.077 


.076 


.111 


1.19 


1.49 


3.75+ 


15 


" . 6'72 


.075 


.093 


.40 


2.62+ 


1.65 



i 



Significant at 10 percent level. 
*The C parameter for these items is 'treated as known. 
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Table 3 

. • Theoretical and Empirical Standard Errors for A 



Item 














No . 


!a 


■ fA 






1* 


.088 


.105 ■ 


.141 


.95 


.91 


3.60t 


2* 


.044 


.046 


.039 


.88 


' .51 


.74 


3* 


.097 


.117 


.094 


1.39 


.32 


,22t 


4* 


.060 


.065 


.080 


.89 


2.77t 


.86 


5* 


.045 


.047 


.054 


.63 


2.44t 


,.93 


6 


.103 


.102 


.123 


1.54 


.30l 


2.51t 


7 


.105 


.105 


.147 - 


1.30 


2.25+ 


2.35t 


8 


.113 


.115 


.159 


1.29 


3.20t 


1.29 


9 


.123 


.128 


.182 


1.89 


3.39t 


.80 


10 


.184 


.193 


.160 


.'71 


.55 


.79 


11 


.115 . 


.12.0 


.132 


1.42 


1.85 


.34 


12* 


.060 


.060 


.076 


.95 


2 . 941- 


.94 


13 


.151 


.157 


.187 


2..40t 


1.08 


.79 


14 


.209 


.218 


.240 


1.32 ■ 


.91 


1.43 


15 


.222 


.233 


.182 


.25 


.65 


.93 



t Significant at 10 percent level. 

^The C parameter fqr these items is treated as knovm. 
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significant,. Similar statements apply to Table 4', which shows thel 

comparisons for C . . . ■ / 

Table 5^ compares standard errors for 0 • Let us leave colui^in 3 

\ , ' ■ 

for later discussion. Columns 4 and 5 ghow standard error's of G| cor- 

\ ■ ' 
responding to^^the 6 value in the first column; column S^Xhowev^r, 

is computed from (2) for the group of N people falling in the class 

interval with midpoint 6 . There is good agreement between empirical " 

and theoretical standard errors except for 0 < -1.5 . For low 0^, 

asymptotic results do not ap.pear with the/ usual n and N . | 

Table 5 sho^/s close agreement of our standard error from /Sect ions 

2-4 with the standard error 6f 0 when the item parameters are known. The 

agreement shown h^i^e and in previous tables suggests that (1)' is a good 

approximation to ll^he diagonal of (2) and similarly for item parameters, 

that (2) agrees well with the empirical standard errors , 

\ ■ ^ 

' A comparison qt the third and fifth columns in Table 5| shows what 

I; ■ ■ ] . 

happens to- wheH all C. must be estimated from the data: For 

0^1. , 

8 < -1 , is sharply affected;, for 0 < 6 < 2.5 , therp is very 

little effect. ^ 

' Table 6 Contains \the squared ratios of the empirical jstandard error,^ 
to the theoretical standard errors for the five 6 ^close^t to the midpoint 



of the intervals, and .^ithin at least .1 of the midpoint, i Two of ti^tie 
groups had only two abi\.itie^ within this restriction. Ilf similar caveats 
apply as for the item parameters these ratios will have an F / distribution 
with five and ^ degrees of freedom. Only eight of th^' ratios are 
significant, at the two-tailed 10% level, and only 16 are' greater than 1. 



3U^ /, k 



Table 4 



Theoretical and Empirical Standard Errors in C 



Item 










2/ 2 




No.* 










6 


..056 


.058 


.063 


.39 


.44 


2.79t 


7 


.049 


.050 


.038 


.40 


.35 


.95 


* 8 


.037 


.037 


.045 


3.08t 


.76 


.43 


9 


. 024 


.025 


.039 


.80 


4 . 71t 


1.83 


10 


.025 


.026 


.034' 


2.24t 


2 . 68t 


.27 


11 


.036 


.037 


.043 


.98 


2.67t 


.41 


13 


.026 


.02 7 


.037 


.89 


1.88 


2 . 90t 


14 


.019 


.020 


.028 


2.98t 


2 . 55t 


. .43 


15 


.015 


.015 


.016 


.64 


1.23 


' 1.71 



tSignificant at 10 percent level. 

>>C^,'. ..,C^ , and C^^ treated as known. 
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. Table 5 



Theoretical and Empirical Standard Errors for 0 



All C. 
1 

unknown 



^ 'to and C^^ ti^eated 
' as known 





N 








!i 


-2.75 


10 


2.090 


-2.25 


35 


1.296 


-1.75 


93 


.861 


-1.25 


219 


.607 


-.75 


332 


. .456 


-.25 


326 


.349 


. 25 


227 


.278 


.75 , 


136 


.261 


1.25 


. 77 


. 303 


1.75 


25 


.422 


2.25 


3 


.628 


2.75^- 


0 


.931 



°0|a.,b,c 



.951 
.686 
,516 
.400 
.341 
.295 
.262 
.260 
.289 
.384 
.575 
.874 



a 

_J 

.966 
,699 
.525 
.404 
.3-42 
.295 
.263 
.261 
.290 
.387 
.580 
.878 



_0 
* 

1.134 
.797 
.427 
.332 
.279 
.274 
.286 

• .349 
.412 

* 



*Not computed because of smaM N 



g 
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Table 6 

Ay 

F Ratios for 0 



■ e 






2 2 






-2.75 


3.73+ 


4 .41 + 








-2.25 


.85 


.78 


.43 


11.34+ . 


1.16 


-1.75 


.57 


■1.90 


1.62 


.32 \ 


18.95+ 


-1.25 


.98' 


.63 


.96 


.95 


.77 


-.75 


.26 


.94 


.63 


.81 


.63 


-.25 r 


.71 


1 . 81 ■ 


.73 


.04+ 


.48 


.25' 


.18+ 


.98 


.74 


.80 


.77 


.75 


.61 


• .35 


1.41 


1.21 


.64 


1.25 


2.76+ 


1.82 


.98 


1.08 


1.84 


1.75 


.67 


.41 


1-.08 


1.45 


1.78 


2.25 


.11+ • 


.36 









2.75^ 



tSignificant at 10 percent level. 

'^There were no 6 between 2.65 and 2.85. 
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Table 7 presents the theoretical standard errors of A , B , and 

C , obtained by the method of Sections 2-4, when all C. must be esti- 

1 

mated from the data. It is interesting to compare these values with 

those in Tables 2-4 where C-,,...,Cc > and C, ^ were treated as known. 

' 1 -5 12 , • 

We find that the standard errors of to B^ are increased >• 

drastically by ignorance of to ; all other a(B_^) are much 

increased, Except for i = 11, 13, and 14.. All A^ show sharply 
increased standard errors. For items for which must be estimated, 

on the^^^her hand, the -standard errors of C. are little affected by 
knowledge or ignorance of ^i' • • " *^5'^i2 ' ^ likely explanation for this 
is that v:;ei?;Tors in estimating the scale unit B^ affect the standard 

errors of the. A. and the B. , but not of the C. . 

. \ 1 . . 1 ' X 

We have found in Tables 2-7 some illustrative answers to the 
question: How do estimation errors on one set of item^j affect the 
accuracy of estimated parameters for a different set of items? Such 
effects could not be quantified until now since the standard error of 
an it em^ ^farameter estimate was previously known only for fixed 0 . 
It is only.^ through the sampling fluctuations of 0 that estimation 
errors for one item can affect parameter estimates for another item. 

With 18 treated as known, the Fisher information matrix inverted 

for this study has 3 x 45 - 18 +* 1498 = 1615 rows and columns. The 
matrix inversion by the method of Section 4 used 1232K bytes of memory on 
an IBM 3031 and. took 32. seconds. The computer progtam dealt' with a 45- 
item test; it did nojt take advantage of the fact that the 45 items 
consist.ed of 3 replicate »sets of 15 items each. 
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.Table 7 

Standard Errors (2) of Item Parameters when 
All C. Must be Estimated 



i I 











No'. * 


% 






1 


■■•'■,52 ■ 


.23 


.60 




2/54 ■ 


.i3 


.72 


I 


.35 


.32 


.10 




.26 


.15 


.14 


5 \ ■ 


.97 


.10 


.32 


6 \ ■ 


.19 


.18 


" .07 


7 \ 


.16 


.18 


? .06 


8 \ 


.14 


.21 


.041 


9 \ 


: .12 


.26 


.026 


10 \ 


.11 


.32 


.026 


11 \ 


.10 


.18 


.039 


12 1 


.18 


.14 


.07 


13 \ 


.09 


.23 


. 027 


14 , ; 


.08 


'.31 


.020 


15 


.10 


.33 


.015 



/ 
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In ordei\to verify the numerical accuracy of the inversion, the 
information matrix and the varian(ce-covariance matrix were multiplied. 
The result was an identity matr:^/x accurate to 10 decimal places. The 



variance-covariance matrix obt^n^d in double precision agreed with the 

,// 

matrix obtained in quadruple precision to all six decimal places printed. 



9. Sampling ACovariances and Correlations 



When item' parameters are known, 9 and 0^ ( a h ) are 

// a D 



uncorrelat ed . When abilit^y parameters are known, estimated item param- 
eters for different item^ are uncorr elated . When both item and ability 
parameters are estimate^', in general all estimates are correlated. 
The computer printout df the sampling corirelations for the present 
study consists of 10 Correlation matrices. These ,need only be sum- 
marized here. ' / 

Table 8 shows the thejoretical ( T ) and empirical ( E ) cor- 
relations between estimates of, two different parameters for the same 
item. The correlations ^re generally substantial. For comparison, 

the theoretical* correlations when the abilities are knox^m are- included . 
The empirical' correlations are obtained by dividing the estimated sampling 
covariance by the square roots of the estimated sampling variances. If 
the empirical correlations here have roughly 15 degrees of freedom, their 

,t ' -'-"^ 



andard error is roughly (1 - p )/ /15 = .2 6(1 - p ) . In view of 



their standard'\errors, there is very satisfactory agreement of 
empirical with theoretical correlations. 

Table 9 shows both theoretical and empirical correlations for 



the B, 



i == 1,2,..., 15 ). The corresponding standard qrrors are 



/ 
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Tat|le 8 

' Theore^cal ( T ) and Empirical ( E ) Sampling Correlations Between 

\\ Two Parameter Estimates for the Same Item 
V 

\ 




\ Wable 9 

EXPERIMENTAL (E) AND THEOi^ETIDAL (T) STANDARD E R RO R S ( D I AGONA LS ) AND 
C'ORRELATIONS FOR TRANSI^ORMED B (DECIMAL POINTS OMITTED) 



A. 



"1""^ B~3 B 6 \B 7\ B a B~9 BIO Bll ^BT2 BT3 BT^^ BIS 



(183) 
(156 ) 

1^1 
26^ 

28^ 
509 

0^5 
33<» 

193 
158 



158 
124 

•028 
■128 

■004 
■088 

014 
■040 

•122 
034 

289 
■001 

016 
0 39 

■031 
•005 

■064 
■0 22 

■147 
0 50 



141 
264 

(237) 
( 20.1 ) 

541 
284 

286 
184 

-092 
078 

r036 
-066 

126 
-069 

-005 
-0 46 

-252 
-017 

-10 5 
026 

029 
. 007 

-064 
022 

-247 
-002 

360 
-018 

-04 0 
-039 



( 063 ) 
( 071 ) 

30 8 
377 

040 
151 

-091 
-131 

0 5 6 
-139 

004 
-093 

-274 
-032 

-279 
048 

?68 

.';08 

007 
044 

-298 
-008 

348 
-032 

-155 

-06-8- 



045 
3 34 

286 • 
184 

308 

,3 77 

(066) 
(a68) 



193 -: 
158 -: 



1,28 Vo04 
12\a -i,0 88 



014 -122 289 
040 034 -001 



120 

0=6\6 

-072 
-130\ 



040 - 
151 - 

120 - 
066 - 

-(103)- 
(099)- 



■036 
■066 - 

091 
131 - 

072 - 
130 - 

228 - 
117 - 



-252 -105 
-017 026 



274 
0 32 



■279 
048' 




0 16 -0 31 
039 -005 



029 -064 -247 
007 022 -002 



268 
008 



007 
044 

192 
028 



126 -07 

113 -088 

014 076 

■062 -053 



038 
-089 

-36^ 
-040 

-308 
029 

-007 
003 

192 
028 

-443 
-005 

343 
-018 

2 lE^ 
-039 



-051 
-0 0 3 

-193 
001 

—1-26 
002 



•041 
■051 



107 
■016 

•153 
004 

002 , 
•0O5 

016 
0 0 1 

122 

■ 6^8 5\ 
0 11 \ 



100)-120 
1 10)-042 

120 (088) 
■042 (083) 

,098 -068 
036 -007 



308 -0 0 7 
0 2 9 \0 0 3 

23 6 0 86 
-a0 9 013 

1\07 -15\3 0 02 
•0l6 . 004 -005 

12l\-050' -221 
- 0 3^ -011 X 002- ;- 0 0 9- 



-0 68\ 0 2^ 
-007 \p04 

(067) 198 
(055) 0^ 3 



-ai5 lOl 
1 -qio 

037, -129- 
-0 0 3\-pl3 



■2 98 
■008 

•443 
■005 

-,0 5 1 

J0>0 3 

016 
00 1 

156 
002 

■062 
002 

332 
002 



a2i 

V 


0?5 
004 


198 
023 


( loX) 

(069\) 


-193 
-035 


-1137 
-0 52 


270 
-043 


0 5,0. 
0 02 


-0 15 
001 


037 
-003 


-193 
-035 


\ 1 2 2 ) 

(No 9 7.) 


041 
-071 


-Oil 
-0 6-7 


22 r 

009 


\ 1 0 1 

Voio 


-129 
-013 


-137 
-052 


o\i 
-07a 


(087) 
(091) 


-176 
-069 


156 
002 


-^0 6 2 
0.0 2 


332 
0 02 


270 
-043 


-01 l\ 
-067^ 


-176 

\-0i9:_ 


(094) 

(-083') 


0 18 

00,5 


081 

.^-oor 


- 3,5 7^ 
" 000 


— ^i-sr' 

-062 


-1^3 
-086 


^78 
-Ov68 


-341 
-057 


000 
Oil 


-137 
003 


-063 
-005 


-098 
-087 


-182 
-107 


\0 0^5 
-0 6 5v 


-112 
-0 6 0 



-064 
-022 

360 
-0 18 

348 
-032 

343 
-018 
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given in parentheses in the diagonal. The only theoretical correl- 
ations above .20 are among , y , and B^ . These are the 
four easiest items. Any error in estimating the scale unit ~ ^0 
would seriously affect all these items in the same way. It is hard 
to draw other useful generalizations from this table. 

The corresponding table for the . ( 1= 1,2, ....,15 ) shows 
only 3 theoretical correlations above .20: p^^ == .27 , p^^ = .20 , 
0^^ = .23 . With two exceptions ( p^.^ = -.013 , j^i2 " -•002 ), 
all theoretical correlations are positive. 

The highest theoretical correlation among the ( i = 6,7,..., 

11 and 13, 14, 15 ) is p^_ = . 04 . All correlations are positive. 

The theoretical correlations between A_^ and B_. ( i j ) are 

all below .20 in absolute value, except f6r items 1-4, which vary from 

.14 to ^38. JFor B^^^^^^^^a^^^ ( i ?^ j ; j ?^ 1 , 2 , . . . , 5 , 12 ) there are 

no correlations above .25 in absolute value. For A. and C. ,* there 

are no correlations abov.e .20 in absolute value. 

The theoretical correlations between and 0. ( a ^ h ) 

a D 

are all less than .04 in absolute value. Between 0 and B. , the 

a 1 

.largest correlation in absolute value is .15 (when> i = 1 and 

0 = -2.25 ). Between 0 and A , the largest is .12 (when i = 1 

a 1 

and 0 = -2.25 ). Between 0 and 'C, , the largest is .06. 

6 " ' ^ 
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Suimnary 

When both abilities and item parameters are urknown, the asymptotic 
Sampling variance-covariance matrix developed in this paper appears to 
pt^ovide useful values for Che standard errors needeid for further 
research in item response theory. The magpifude o^ the numerical 
values in the matrix v.exe very much affected by ^the method used to 
define the scale. For a set of . artificial data, this variance- 
covariance matrix compared satisf actorially with empirical results; 
also with the variance-covariance matrices found by the usual formulas for 
the case where the abilities are known or where the item patameters aire 
known . 

With this matrix, the effect on other items of including items 
with poorly determined parameters can be studied. Including items with 
poorly determined c 's increases the standard errors of all of the a 's 
and b ^s but not of the other c 's. The effect of different distribu- 
tions of abilities on the accuracy, of item parameters can: also be studied. 
Hopefully a goodness-of-f it test can now be developed for the three- 
parameter model. 

The standard errdrs of item parameters can now be studied for a < 
s:|.tuation of common occurrence in equating and item banking: Each of 
two tests containing common items is administered to a different group 
of examinees; all parameters are estimated in the same LOGIST run. ' 
It is oE particular interest to determine how the number of common items 
affects the standard error of the parameter estimates. 

\ 
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